2 research outputs found
Bringing UMAP Closer to the Speed of Light with GPU Acceleration
The Uniform Manifold Approximation and Projection (UMAP) algorithm has become
widely popular for its ease of use, quality of results, and support for
exploratory, unsupervised, supervised, and semi-supervised learning. While many
algorithms can be ported to a GPU in a simple and direct fashion, such efforts
have resulted in inefficient and inaccurate versions of UMAP. We show a number
of techniques that can be used to make a faster and more faithful GPU version
of UMAP, and obtain speedups of up to 100x in practice. Many of these design
choices/lessons are general purpose and may inform the conversion of other
graph and manifold learning algorithms to use GPUs. Our implementation has been
made publicly available as part of the open source RAPIDS cuML library
(https://github.com/rapidsai/cuml)
cuSLINK: Single-linkage Agglomerative Clustering on the GPU
In this paper, we propose cuSLINK, a novel and state-of-the-art reformulation
of the SLINK algorithm on the GPU which requires only space and uses a
parameter to trade off space and time. We also propose a set of novel and
reusable building blocks that compose cuSLINK. These building blocks include
highly optimized computational patterns for -NN graph construction, spanning
trees, and dendrogram cluster extraction. We show how we used our primitives to
implement cuSLINK end-to-end on the GPU, further enabling a wide range of
real-world data mining and machine learning applications that were once
intractable. In addition to being a primary computational bottleneck in the
popular HDBSCAN algorithm, the impact of our end-to-end cuSLINK algorithm spans
a large range of important applications, including cluster analysis in social
and computer networks, natural language processing, and computer vision. Users
can obtain cuSLINK at
https://docs.rapids.ai/api/cuml/latest/api/#agglomerative-clusteringComment: To appear in ECML PKDD 2023 by Springer Natur